AIbase
Home
AI Tools
AI Models
MCP
AI NEWS
EN
Model Selection
Tags
Reinforcement Learning Fine-Tuning

# Reinforcement Learning Fine-Tuning

Deductive Reasoning Qwen 32B
MIT
A model trained through reinforcement fine-tuning based on Qwen 2.5 32B Instruct, specifically designed to solve challenging deductive reasoning problems in the Temporal Clue dataset.
Large Language Model Transformers English
D
OpenPipe
1,669
39
Tifa DeepsexV2 7b MGRPO Safetensors GGUF
Apache-2.0
Tifa-DeepsexV2-7b-MGRPO-safetensors is a multilingual (Chinese and English) large language model based on the transformers library, optimized through incremental pre-training, supervised fine-tuning, and reinforcement learning, suitable for role-playing and chain-of-thought tasks.
Large Language Model Supports Multiple Languages
T
mradermacher
283
1
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
English简体中文繁體中文にほんご
© 2025AIbase